-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor container chain spawning logic to not need require embeded node outside of collation #627
Conversation
@@ -796,7 +810,7 @@ fn handle_update_assignment_state_change( | |||
/// interrupted before it finished downloading the state, in that case the node will use warp sync. | |||
/// If it was interrupted during the block history download, the node will use full sync but also | |||
/// finish the block history download in the background, even if sync mode is set to full sync. | |||
fn select_sync_mode( | |||
pub fn select_sync_mode_using_client( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this function we only need the orchestrator_client
to check an edge case for when the container chain is still at block 0. We could also use the relay_chain_interface and get the latest finalized block from there, would that be better for the ws node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could yes, in the end we are fetching that information from the relay itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which function in the relay interface should be called? Indeed that would allow to get rid of the generic and not need the orchestrator client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably paras->heads
, but you would need to parse the header to get the block number. There are some utility functions in pallet_author_noting
, such as author_from_log
. @girazoki any better ideas? Maybe there is some other storage that only exists after the parachain has included its first block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we tackle that in a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes let's do it as a separate PR. But it would definitely make it more compatible with solo-chains
Coverage Report@@ Coverage Diff @@
## master jeremy-orchestrator-spawner-without-client +/- ##
==============================================================================
- Coverage 67.32% 66.88% -0.44%
Files 255 255
+ Lines 44448 44587 +139
==============================================================================
- Hits 29922 29819 -103
+ Misses 14526 14768 +242
|
@@ -49,7 +49,7 @@ yargs(hideBin(process.argv)) | |||
} | |||
process.stdout.write(`Done ✅\n`); | |||
const onChainGenesisData = await api.createType( | |||
"TpContainerChainGenesisDataContainerChainGenesisData", | |||
"DpContainerChainGenesisDataContainerChainGenesisData", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remember to mention this in breaking changes, this will probably break all the registration scripts we have
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything that uses api.createType("Tp...
instead of Dp
will stop working?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes but I dont think we use it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe the dapp does
@@ -235,6 +246,8 @@ async fn try_spawn( | |||
); | |||
|
|||
if !start_collation { | |||
collation_params = None; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot why we used both validator && start_collation
instead of just one boolean, here is a summary:
The reason start_collation
exists as a separate arg to validator
is that in the old days instead of restarting the container chain node we used collate_on
to sent a message to enable the collator. So we would have to pass collation_params
set to Some
and start_collation
set to false, and then use the collate_on
closure to start collation later.
Now we don't have that collate_on
, so just setting to None
here is ok. Although I'm considering bringing collate_on
back, for parathreads, but that's an issue for the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mayeb we should add a comment for this then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that in the next refactor
@@ -70,11 +69,21 @@ const MAX_DB_RESTART_TIMEOUT: Duration = Duration::from_secs(60); | |||
/// Assuming a syncing speed of 100 blocks per second, this will take 5 minutes to sync. | |||
const MAX_BLOCK_DIFF_FOR_FULL_SYNC: u32 = 30_000; | |||
|
|||
pub trait TSelectSyncMode: | |||
Send + Sync + Clone + 'static + (Fn(bool, ParaId) -> sc_service::error::Result<SyncMode>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Fn()
as a trait bound is very ugly, promise to remove this trait as soon as possible in a future PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, as we discussed we should be able to remove SelectSyncMode
entirely :)
Zombienet tests are failing with error:
so indeed some registration script probably broke |
Actually the registration seems to work fine, it's just the node that fails to read the genesis data for some reason |
It was due to the fact that I was storing an orchestrator block hash in the main structure itself, which is probably the genesis block in that test. Since 2002 is registered on the fly later, the spawner was not able to get the genesis data. I modified (again 💀 ) the chain interface to allow fetching the best or finalized head (the relay interface have that too), which should fix the issue. Currently I use best head to do like previous code, but it might be better to use the finalized one? |
In practice it shouldn't matter because the values we read only change after 1 session, but I guess using finalized is more consistent with the |
…-spawner-without-client
…-spawner-without-client
I did create an issue to change just this, but let's do it as a separate PR |
Depends on moondance-labs/dancekit#26
Currently the container chain spawning logic requires an embeded orchestrator client, which prevents using it in data preservers nodes that may interact with the orchestrator through a WebSocket connection. The linked PR modifies the
OrchestratorChainInterface
to add functions to get necessary data without the need for a client. When collating a client is still needed, as a data preserver node will not collate on the orchestrator chain.